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Abstract: Many extensions to the Standard Model of particle physics hypothesize the existence 
of new low-mass particles. Typically there are few theoretical constraints on the mass or lifetime 
of such particles. This requires the experimentalist to perform a search in which both the mass 
and lifetime of the particle are unknown. Such searches for low-mass particles are complicated by 
the possible presence of resonances and other non-monotonic backgrounds. This paper presents a 
simple and fast approach to assigning significance and setting limits in such searches. 
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1. Introduction 

Many extensions to the Standard Model (SM) of particle physics hypothesize the existence of new 
low-mass particles. Typically there are few theoretical constraints on the mass or lifetime of such 
particles. Therefore, the experimentalist is required to perform a search in which both the mass and 
lifetime of the particle are unknown. Such searches for low-mass particles are complicated by the 
possible presence of resonances and other non-monotonic backgrounds whose probability density 
functions (PDFs) are not well known or constrained. 

The null results from numerous searches for new particles that decay into leptons has shifted 
the focus of many theorists towards leptophobic bosons. For example, Tulin []TI] proposes a force 
that couples weakly to quarks at the QCD scale. Experimentally, this requires a search for a MeV 
or GeV mass boson that decays to If such a boson decays promptly*, then searching for 

'Throughout this paper I refer to decays as either: (prompt) the separation of the decay point from the production 
point is too small to be resolved by the detector and (displaced) this separation is resolvable. 
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Figure 1. Simulated PDF including interference between B K*ji^ and B —> K*p[ii^) for two arbitrary 
phase differences between the two amplitudes (shown as solid and dashed lines). 


it requires dealing with an irreducible resonance background. It is preferable to perform this search 
with the data blinded; however, the reaction under study is governed by non-perturbative QCD 
and so the only a priori assumption about the SM PDF that can be made is: any resonance that 
couples to the same final state as the new boson may contribute to the SM background. The number 
of known resonances is large and new resonances are still being discovered, even those that only 
contain light quarks [0]. Furthermore, parametrization of resonances is reaction-dependent and 
interference effects are often significant. Therefore, the number of nuisance parameters in the 
background PDF is very large and there will likely always be some component to the PDF that is 
simply not modeled accurately to better than a few percent. Even in dedicated amplitude analyses 
performed without blinding the data, it is often difficult to evaluate the systematic uncertainties 
associated with the fit PDF due to the large number of nuisance parameters. It is also difficult to 
obtain a fit model which describes the data at the percent level for all masses. 

Seaches for new bosons decaying to leptons are not immune to the problem of resonances. 
Consider, e.g., the rare decay B ^ K*pii which is an excellent laboratory to search for a new 
low-mass boson that couples to mass [0. This decay can have a contribution from B — K*p{iiiJL). 
Using the published LHCb B —)• results 0] and the ratio of branching fractions i%{B — 

K*p)^{p —^ pp) / ^{B ^ K* pp) from the PDG 0 one obtains a predictedy ieldofB 
that is less than one event; however, this calculation ignores interference. Figure [I] shows a toy 
prediction obtained using the measured branching fractions ^{B ^ K*pp), ^^{B ^ K*p) and 
^{p —7- pp), and the assumption that the full amplitudes for B — K*pp and B — K*p{pp) in¬ 
terfere^. Naively using the ratio of branching fractions suggests B —)■ K*p{pp) is negligible, but 
the interference cross term could be large enough to generate a “local” 5% effect. In principle the 
p contribution could be parametrized (with some uncertainty); however, the branching fractions of 
many resonance decays to p^p^ and of B ^ K* resonance have yet to be measured and so such 
contributions would be unconstrained. Since the effect generated can also depend on interference, a 
full model allowing all resonances to interfere (which requires introducing unknown relative phases 
for each resonance) must be constructed. 

ignore muon spin states here which produces an overestimate of the size of this interference effect, but for illustra¬ 
tive purposes this model is sufficient. 
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For the case of displaced decays, the largest background contributions are often due to some 
mis-reconstructed candidates. These include material interactions, photon conversions, etc. Given 
that the searches considered in this paper must scan a large mass and lifetime range, predicting 
(blindly) the full list of displaced backgrounds and parametrizing them will often not be possible. 
Furthermore, resonances may contribute to the displaced search via decays of beauty or charm 
particles. E.g., in the aforementioned B —> K*ixix search, it is possible to obtain the K* and 
from two different B decays at LHCb. This could result in a candidate that contains what appears 
to be a long-lived charmonium resonance. It is desirable to approach such a search by making as 
few assumptions as possible about the backgrounds. 

This paper presents a simple approach for performing a search for a new low-mass particle 
of unknown mass and lifetime. The method is described in Sec. 2. Obtaining global p-values, 
including the so-called trials factor or look elsewhere effect, is discussed in Sec. 3, while setting 
upper limits and the coverage properties of such limits is discussed in Sec. 4. Discussion and 
summarizing are provided in Sec. 5. 

2. Search Strategy 

The strategy that will be employed is to perform a scan in mass since the new particle mass is not 
known. The step size of the scan will be a(m)/2, where a(m) is the detector mass resolution. For 
each test mass value a search will be conducted for evidence of a particle with no constraints on its 
lifetime. For illustrative purposes, the following toy model is used: 

• the searched region in mass is 1 GeV wide^; 

• the mass resolution is taken to be o{m) = 2 MeV, giving a step size of 1 MeV; 

• the expected number of prompt events (in the absence of signal) is 1000; 

• the expected number of displaced events (in the absence of signal) is 100; 

• both prompt and displaced events are generated uniformly in mass. 

This toy data set is sufficient for illustrating how the proposed method works. Some variations are 
considered later in the text. 

2.1 Mass 

As an alternative to fitting the mass distribution in data with a PDF whose accuracy is a priori 
unknown and difficult to validate, I propose the following simple approach: For each test mass 
value in the scan, m(test), use the mass sidebands to estimate the expected background"* yield. The 
signal region (where signal events would be observed if a new particle of mass close to /n(test) 

just take the mass region searched to be 0 to 1000 MeV since the absolute mass values do not matter; i.e., shifting 
the mass window to a range that starts at some allowed kinematic limit has no affect on applying this method. For this 
toy study I also do not specify what the decay products of the particle being searched for are since these also do not 
matter apart from identifying which resonances may contribute to the data. 

^Any candidate that does not come from the decay of a new particle is considered as a background in the search. 
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exists) is defined as |m(test) —m\< 2o{m), while the background region is the sidebands defined 
as 3o{m) < |ni(test) —m\< {2x + 'i)o{m), where x is fhe ratio of the size of the signal to sideband 
regions (these regions do not need to be the same size). The factors that contribute to choosing x 
are discussed at the end of this subsection. 

If the background PDF is approximately linear in the region |m(test) —m\ < (2x + 3)a(m), 
then the observed yield in the background region provides an estimate of the expected yield in 
the signal region. The presence of resonances and some types of background will violate this so- 
called local-linear approximation; this is dealt with below. Under the local-linear assumption, the 
likelihood is given by 

L{ns,nh\s,b) = ^{ris.s + b) ■ ^{nh,xb), ( 2 . 1 ) 

where ^ denotes the Poisson PDF, denote the yields in the (signal, background) regions and 
s and b are the signal and background rates in the signal region. It is straightforward to account 
for uncertainty in the relationship between the sideband and search window background yields, or 
equivalently in deviations from local linearity in the background PDF, by augmenting the likelihood 
as follows: 

L{ns,nb,x\s,b,y) = ^{n^.s + b) ■ ^{nb,yb) ■^{y,x,ay), (2.2) 

where denotes the Gaussian PDF and Gy is the uncertainty on the scaling factor between the 
background yield in the signal and background regions^. If one can estimate the size of possible 
deviations from local linearity, e.g., due to resonance contributions, then these can be incorporated 
into the likelihood via Gy. The profile likelihood can then be used to obtain the significance of any 
excess of events (see, e.g.. Ref. [0) and/or to set limits [0]. See Appendix A for a more detailed 
discussion on the likelihood. 

In the previous paragraph the nominal background PDF is taken to be linear but in principle any 
background PDF can be used. For a low-statistics search the linear approximation will be sufficient; 
however, for cases where the sample sizes are large, then the uncertainty due to deviations from 
local-linearity may be large compared to the statistical uncertainties. All that is required to use this 
method is that based on the observed yields in the sidebands, an estimate for the expected yield in 
the signal region can be obtained. If a non-linear background PDF is chosen as nominal, then x 
represents the scaling factor between the expected background yields in the sidebands and signal 
region, which may no longer be simply the ratio of the size of the regions. The Gy term is again 
the uncertainty in the scale factor (independent of how x is defined). See Sec. 5 for discussion on 
non-linear background PDFs. For the remainder of the method overview, I will assume a nominal 
background PDF of local linear. 

Appendix B provides a study of the deviation from local linearity due to resonance contribu¬ 
tions. These deviations are a function of the width of the resonance, F, the fraction of the total PDF 
near m(test) due to the resonance, and of a(m) and x. The conclusions are as follows: 

• for F/a(m) > 20, even if the resonance makes up close to 100% of the PDF the choice x = 1 
is still local-linear to about 10% (wide resonances are safe)', 

^This likelihood can be used for any value of rib', however, Appendix A shows that for the case where iij, is large the 
background region Poisson term can be replaced with a Gaussian one which results in a much faster algorithm. 


- 4 - 




n, iij 


Figure 2. The p-value vi iig for (left) b = 10, (middle) b = 100, (right) b — 1000 for (solid black) b known 
with no uncertainty, (dotted red) x = 1 and (solid red) x = 5. 


• for r/a{m) < 5, the deviation from local-linear is large unless the resonance contribution is 
small (narrow resonances must be vetoed); 

• for 5 < r/a{m) < 20, the local-linear approximation is valid at the 10% level for moderate 
resonance contributions, but not valid if the resonance is dominant. 

Assuming a ^(10%) value is chosen for Gy, then wide resonances can effectively be ignored (in¬ 
cluding those that have yet to be discovered). If nothing is known about the possible size of a 
contribution from a narrow resonance, then the region near the resonance mass should be vetoed. 
If some limits can be placed on the size of a resonance contribution, then this veto may not be 
required. Such limits must be determined in each analysis independently. 

The key point is that narrow resonances are typically well known (including their branching 
fractions to many final states), whereas wide resonances are not well known. The sideband ap¬ 
proach allows the analyst to ignore wide resonances by accounting for their non-linear effects via 
the Gy term in the likelihood. Narrow resonances likely must be vetoed, but there are few of these 
and their properties are well measured. Intermediate-width resonances are also accounted for au¬ 
tomatically by Gy provided they do not dominate the local PDF. Such cases should be rare and can 
be studied using alternative decays of the resonance or via the data directly using bins ff{l0G{m)) 
wide. 

Other categories of background that have a broad peaking structure are also handled naturally 
in this approach. The study in Appendix B can be applied to non-resonant backgrounds with the 
same conclusions drawn: only a background whose peak is narrow relative to G{m) must be vetoed; 
all other backgrounds are accounted for via Gy (the analyst does not need to know what these are 
or account for them individually). E.g., in the B —search, partially reconstructed charm 
particle decays can be ignored, while any J ^ contribution must be vetoed. 

The parameter x should be optimized for each analysis. The larger x is chosen to be, the less 
statistical uncertainty there is on the background rate; however, this also increases the size of the 
region in which local-linearity is assumed. Figure @ shows a comparison of using sidebands to 
estimate b to when b is known (which I assume here is not possible; this is added for illustrative 
purposes to show the best possible performance which could be obtained if b could be known). 
Setting Gy/x = 0.1 typically loses very little power (at most 10-20% sensitivity to the signal rate 
relative to the ideal situation of a known background PDF), so a good rule of thumb would be to 
choose X as large as possible such that Gy/x = 0.1 is still valid. 
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2.2 Lifetime 


I now proceed to considering lifetime, T, information and assume that the background can be 
factorized into two components: (1) a prompt background where the signal candidate’s children all 
originate from the same spatial point as the rest of the final state and (2) a displaced background 
where they do not. The lifetime PDF for type (1) is given by the detector resolution. The lifetime 
PDF for type (2) is assumed to be unknown. In principle it may be obtainable from the mass 
sidebands; however, looking at these sidebands is forbidden in a blind analysis when the mass 
is unknown. Furthermore, in many cases the displaced background data will be sparse making 
obtaining a reliable estimate of its PDF (which likely depends on m) impossible even if the data is 
not blinded. 

If the lifetime distribution of both prompt and displaced backgrounds is the same in the signal 
and background regions, then a non-parametric two-sample goodness-of-fit test can be used to test 
the null hypothesis at each m(test). Appendix C discusses several such distribution-free tests, while 
Appendix D shows the results of applying these to the toy data set described above. The conclusion 
of these studies is that such tests are far from optimal and require introducing the assumption that 
the displaced background T PDF does not vary with m (which may not be true and would be difficult 
to validate). 

An alternative (and simple) approach is to define fwo T regions af each m(fesf): (1) a prompf 
region {e.g., T < 3a(T)) and (2) a displaced region (t > 3a(T)). The mass sidebands in each region 
can be used fo esfimafe fhe expecfed background rale. The join! likelihood is defined as 

L(n^■"'’^ = (2-3) 

i.e., fhe likelihood is fhe producf of fhe likelihoods from each region individually (each obfained 
in fhe same manner as discussed in fhe previous secfion). The profile likelihood (A), which is 
fhe ratio of fhe maximum likelihood wifh fhe signal rale fixed lo fhe maximum likelihood wilh 
fhe signal rale free lo vary (see Appendix A), for Ibis Iwo-region lesl is fhe producf of fhe profile 
likelihoods from each region. This is Irue because no assumplion is made aboul fhe lifelime of fhe 
new parlicle and so Ihere is no assumed relalionship aboul fhe signal yields in each region. The 
asymptotic dislribulion of —2 log A is a wilh fwo DOFs. Appendix D shows lhal Ibis simple 
approach, which only uses fhe lifetime information to delermine which region each candidate falls 
in, is nearly optimal excepl when T ~ a(T). 

The approach presenled in Ihis paper only assumes lhal varialion in fhe lifelime dislribulion vs 
mass is slow enough lhal Ihe linear approximation holds in bolh Ihe prompl and displaced regions. 
The possible deviations from linearity are accounted for by ay. In Ibis paper I use a conslanl ay in 
Ihe nolalion bul ay is allowed to depend on m and differenl values of ay can be used in Ihe prompl 
and displaced regions. This is also Irue of x: if can be chosen to be differenl values for differenl 
lesl masses and in Ihe prompl and displaced regions. Using variable ay and x does nol inlroduce 
any additional complexity. 

Finally, I note lhal one can run bolh Ihe Iwo-region profile likelihood lesl and a Iwo-sample 
test Appendix D shows lhal Ibis approach provides a small increase in performance in Ihe region 
near T ~ a(T); however, Ihere is an imporlanl assumption required to use Ihe Iwo-sample lest This 
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approach requires that all lifetime PDFs are the same in the signal and background regions, i.e., that 
locally the z PDF is independent of m. This may not be true for displaced backgrounds and would 
be difficult to validate unless the number of displaced-background candidates is large. Given that 
the gain in performance is small and the additional complexity introduced into the analysis is non- 
negligible, I conclude that unless one has a reason to expect T ~ a(T) and a method for validating 
the T PDF m dependence, that the two-region profile likelihood tesf is the best option. 

While it may be surprising to the reader that such a simple approach performs so well, it is 
often the case that analysis-specific information provides great power. In this case, factorizing the 
background into two categories using the known detector resolution is key to this search. Note 
that this procedure allows the analyst to completely ignore the m and z distributions for any back¬ 
ground that does not form a narrow peak relative to o{m). One instead relies on the fact that such 
backgrounds will populate the signal and sideband regions and that locally their PDFs are approx¬ 
imately linear. The possible deviation from linearity is incorporated into the likelihood via Oy. 
Narrow peaking backgrounds, e.g., J/xj/ ^ Aifi in the B —)■ K*iiix search, must be vetoed in each 
lifetime region unless it can be shown (or known) that they can only contribute to one region. Such 
backgrounds should be few and easy to identify. 

3. j!?-Values 

The full procedure involves first determining the local p-values at each m(test), then obtaining the 
global p-value of the most significant excess observed in the full mass range. An outline of the 
procedure is as follows: 

• The full mass range is scanned in steps of a(m)/2. 

• An independent test is performed for each m(test) where the signal and background regions 
are defined as |/n —m(tesf)| < 2o{m) and 3o{m) < |/n(tesf) — m\< {2x + 3)o{m), respec¬ 
tively {x should be optimized for each analysis). 

• The signal and background regions are divided into prompt and displaced sub-regions. The 

quantities are the inputs to the local two-region likelihood. 

• The profile likelihood provides the local test statistic (see Appendix A). Since the lifetime of 
the new particle is unknown, there is no assumed relationship between the signal rate in the 
prompt and displaced regions. 

• For each /n(test) an approximate local p-value is obtained using the asymptotic formula 
for the profile likelihood fesf stafistic. As discussed below, the accuracy of the asymptotic 
formula for this test only needs to be good enough to properly select the most significant 
local excess. 

• The minimum p-value from the full mass scan is selected as the test statistic to which a 
significance is to be assigned. This approach is motivated by the assumption that there is at 
most one new particle contributing to the data sample. 
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The global p-value is not the minimum local ji-value as this would ignore the so-called trials factor, 
or look elsewhere effect, induced by the fact that a large number of tests have been performed. 

Appendix E discusses the fact that this test has been designed such that it can be run on 
lOM data sets for ^(1000) m(test) values in about 2 hours on a single CPU core. This permits 
determining the global p-value without the need for using asymptotic formulae. The only reliance 
on asymptotic formulae is in selecting the minimum local p-value; therefore, the accuracy of the 
asymptotic formula only needs to be sufficient to properly select the most significant local excess. 
There is no need to interpret the local p-values as probabilities under the null hypothesis. 

Figure ^ shows the local p-values obtained from a single simulated toy data set. Since the step 
size in m(test) is smaller than the signal and sideband regions, neighboring tests are correlated. 
This produces a jagged-looking distribution. The test mass near 520 has the minimum p-value 
so would be selected to be assigned a significance in this data set. To obtain the significance the 
procedure is as follows: 

• Obtain the minimum local p-value from the data as described above. 

• Get an approximate null PDF from the data by ignoring the region near the most significant 
excess and obtaining a smooth PDF from the remaining data (interpolating into the removed 
region). Below I show that the details of how this is done are not important. 

• Generate an ensemble of simulated data sets from the PDF from the previous bullet point. 
The global p-value is the fraction of simulated data sets that have a minimum local p-value 
less than that observed in the data. 

• The number of generated data sets determines the statistical uncertainty on the global p- 
vaule. The most likely outcome, no evidence for a new particle, will require only i^(lOO) 
data sets. To confirm > 3a requires ^(1000) while > 5a requires ^(10^). 

• Confirming > 5a can be done on a single CPU core on a laptop. In the unlikely (and exciting) 
event that not a single simulated data set in 10^ has a minimum p-value less that that of the 
data, the asymptotic formula from Ref. [S] can be used to obtain an approximate significance 
(if one is desired). 

Figure ^ shows the cumulative distribution of minimum local p-values obtained from lOM toy data 
sets. One can see that to obtain a global 3a in this example requires a local p-value of about 
The trials factor then is ^(400) which is roughly the width of the full mass range divided 
by a(m) (that is 500 in this example). The asymptotic distribution provides an underestimate of 
the significance in this example for small p-values due to the small sample sizes used in each local 
test. Note that the true PDF used in the toy model is local-linear and so setting cjy/y = 0.1 is 
an overestimate of the local non-linearity. In this example, such an overestimate produces only a 
minor shift (towards lower significance; it produces a conservative estimate). 

Figure ^ shows that the cumulative p-value distribution obtained using an alternate (highly 
non-mono tonic) PDF but the same sample size and detector resolution (and, thus, test mass values). 
There is very little dependence on the data PDF. This means that the exact PDF used to generate the 
pseudo data sets is not important. For example, one could simply bin the data in a histogram with 



Figure 3. Local p-values obtained from a single simulated data set (x = 1). 



Figure 4. Cumulative distribution of minimum local p-values obtained using simulated (black) toy-model 
events compared to the (blue) asymptotic expectation®]. Two variations of the test are shown (both use 
X = 1): (solid) no uncertainty in the relationship between the signal and background regions and (dashed 
red) (Jy/y = 0.1. The discrepancy at very small p-values between the asymptotic and solid distributions is 
due to low statistics in each local test region. The data sets were generated using a PDF that is local linear 
so it is expected that using Gy/y = 0.1 is an overestimate of the scaling uncertainty which results in an 
underestimate of the significance. 


wide bins (relative to o{m)), remove the most-significant excess region, and use spline interpolation 
to obtain a background PDF (which interpolates into the removed most-significant region). This 
will be accurate enough to produce a reliable cumulative p-value distribution. 

To summarize: One can confirm up fo > 5a withouf using asymptotic formulae in about an 
hour on a laptop. Assignment of a significance beyond this level can be done using the asymptotic 
formula as an estimate (if this is desired). Figure ^ demonstrates that even an oscillatory PDF is 
handled naturally (I did not input any knowledge of this PDF to the method except that Oy/y = 0.1) 
provided the features of the PDF are wide relative to o{m). 

4. Limits 

Upper limits are to be set as a function of m and T for all m. In the event that a globally significant 
excess is observed, the analyst can additionally perform PDF-based fits to determine estimators for 
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Figure 5. Cumulative distribution of minimum local p-values obtained using simulated toy-model events 
distributed according to the PDFs shown at right (black is the nominal linear model, while red is a non¬ 
monotonic alternative). The distribution of local /r-values has little dependence on how the data is distributed 
as expected. This distribution is driven by the size of the mass range being searched and a{m). Both results 
here are shown for oy/y = 0.1 and x = 1. 


the new particle mass and lifetime. The two-region profile likelihood can be used to set the limits 
after making the following modifications to the likelihood function: 

• for each value of z, there is a relationship between the number of signal events expected in 
the prompt and displaced regions; 

• a Gaussian term is added to the likelihood to account for uncertainty (due to detector effi¬ 
ciency) in the fraction of signal expected in the prompt and displaced regions; 


• another Gaussian term is added to the likelihood to account for uncertainty in the absolute 
detector efficiency scale of the signal (most likely relative to a normalization decay mode). 


The likelihood for each (m(test), t) is then given by 




displ 


x,t|...) 


= L{n"\ 

X L(nfP‘,n 


n"\x\e -s-f, 


X 5^(/,/Mc(T),a(/)) x?^(e,£Mc(T),a(£)), 


(4.1) 


where / is the fraction of signal events in the prompt region with expected value from simulation 
/mc(t) and uncertainty cj(/), and £ is the efficiency (typically relative to some normalization 
reaction) with expected value from simulation £mc(t) and uncertainty cj(£). The limits are then 
set by scanning the profile likelihood using the same method, including the handling of special 
circumstances, discussed in detail in Ref. [Ell but using the likelihood given in Eq. For setting 
limits there is no need to generate lOM toy data sets; thus, I do not provide analytic solutions and, 
instead, use MlNUlT to numerically scan the profile likelihood. 

In the toy analysis I choose to normalize the new particle yield to the observed prompt yield 
in the full mass region. This emulates the situation where the prompt background is dominantly 

®This method actually returns a confidence interval whose lower limit may be > 0. Since the significance is discussed 
above, here I only study upper limits but the method will produce a lower limit as well. 
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Figure 6. Comparison between limits obtained for a single simulated data set from (black) the TRolke class 
(only Poisson uncertainties) and (dashed red) the method discussed in this work (with Oy/y = 0.1,;c = 1) 
for (left) test T = 0 and (right) T = 1000(7(t). In the left plot only the prompt region is used with TRolke, 
while in the right plot only the displaced region is used (TRolke only handles one region). For very small 
and large T, using only one region is an excellent approximation to the full method presented here; thus, the 
agreement between TRolke and this method for these T values is expected. 

a well-known SM process. I take the ratio of the efficiency for detecting the new particle to the 
normalization process to be one. In reality there will be some dependence of e on m and T and 
there will be some candidates in the prompt region that do not come from the normalization mode, 
but these just scale the limits (and contribute to a(e)) so I will not discuss them here. 

For the cases where the test T is <C o{'c) or ;:§> cj(t), to a good approximation only the prompt 
or displaced region matters. Therefore, in such cases the one-region results obtained using the 
TRolke class in ROOT [0 should be close to those produced here provided Oy/y is small. Fig¬ 
ure ^ shows that this is the case. With Oy/y = OA, the limits returned by this method are slightly 
larger than TRolke (which here takes the uncertainty on b to be purely Poisson). If = 0 then 
for small or large T the limits produced by this method are the same as TRolke. 

Figure 0 shows how the limits depend on T for a single toy-model data set. Since in this 
example the limits decrease with increasing z. Figure shows the dependence of 

the limits on z expected (obtained from the lOM toy data sets generated). For the toy-model data, 
whose PDF is uniform in m, the expected limits have no m dependence. In general this will not be 
the case. 

Figure 0 shows the coverage obtained for various configurations of this method and various 
background rates expected in the prompt and displaced regions. The coverage properties are good 
except for at small s. At small s the method over covers but this is expected and unavoidable. 
Otherwise, the method tends to over cover by only a few percent. Note that the small over coverage 
shown in Fig. 0 is due to the very low statistics of the samples studied. The possible observations 
are discrete, and with such low statistics it is not possible to obtain perfect coverage. 

5. Extensions 

The method is not restricted for use where the local-linear approximation is applicable. For exam¬ 
ple, in high-statistics searches, one could consider using multiple sideband regions for each ni(test) 
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Figure 7. Upper limits from a single data set on the ratio of new particle production to the SM reaction for 
T values (lighter to darker) T = (0,1,5,10,50,100,500, 1000)(7(t). 

and using spline interpolation (of whatever order is sufficiently constrained) instead of the local- 
linear approximation. Another approach could be to unblind some small fraction of the data and 
obtain estimates for the background PDF there. However the background estimates are obtained 
for each ni(test), they will have some uncertainty ay and so this method can still be applied using 
Eq. A. 10 in Appendix A. There is no restriction to any particular local background shape. 

6. Summary & Discussion 

This paper presents a simple likelihood-based approach for searching for a particle of unknown 
mass and lifetime in the presence of unknown non-monotonic backgrounds. Instead of exhaustively 
fitting the data with background PDFs containing hundreds of nuisance parameters, I propose to use 
the local-linear (or alternative local PDF) assumption and simple sideband counting. Deviations 
from the nominal local PDF are parametrized via a single parameter ay in the likelihood. If Oy is 
overestimated, then only minor underestimation of the p-values is found. This allows the analyst 
to concentrate on a small number of possible narrow peaking backgrounds and effectively ignore 
all other contributions. 

The lifetime information is used in a simple two-region approach which is nearly optimal 
except when T ~ <7 (t). This permits the avoidance of attempting to parametrize displaced back¬ 
ground PDFs from a few sparse observed data. This method is fast enough to verify significance 
> 5a in an hour on a laptop; thus, reliance on asymptotic formulae is not required. Furthermore, 
when setting limits only the integrated detector efficiency in each lifetime region is required to be 
determined for each mass. A detailed determination of the uncertainty on the detector efficiency 
vs lifetime is not required since the T information is only used to classify candidates as prompt or 
displaced. For limit setting, uncertainties on the integrated detector efficiency and on the fraction 
of events that fall in each region are included in the likelihood and the coverage is shown to be 
good. 

Finally, Fig. shows the low-recoil (high M(jU/i)) data observed by FHCb in the decay 
B — /f/ijuUH]. There is a clear and sizable contribution from the i//(4160) resonance. The size of 
this contribution was unexpected. What would have happened if a blind analysis of this data had 
been performed to search for a new prompt particle that decays to /i/r? If a tit using a monotonic 
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background PDF and no i/r(4160) term had been used^, then I estimate that the local /j-value near 
4200 would have given a local significance of 4 — 5a. Using the method presented in this paper (in 
only the prompt region), I estimate that the p-value is about 0.3(0.4) for Oy = 0(0.1) for x = 1. This 
is due to the fact that r(t/r(4160)) ~ 10a(/n) and so locally the resonant peak only deviates from 
local-linearity by about 15%. No false claim of a discovery would have been made. Furthermore, 
given the large number of nuisance parameters and sample size, a fit-based approach would not 
provide much (possibly no) additional sensitivity to new particles (no matter how much effort was 
put into developing the fit model). 
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^This is unlikely to have happened since prior to observing this data one would still have expected some charmonium 
contributions. The more likely scenario would be that a fit would have been performed that contained every Xj/ state 
with masses and widths free to vary within their nominal values; various other resonant shape parameters free to vary; 
and the relative phase of each amplitude free. This would then have required a serious systematic study to determine 
the p-values. Here I am simply using this as an example of how an unexpectedly large wide resonance contribution is 
handled naturally in the method presented in this paper. 
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Figure 8. Expected upper limits on the signal rate relative to the normalization mode vs m for simulated 
toy-model data (ffy/y = 0.1,x = 1). The lines are (black) mean, (dark gray) enclose the 1(7 and (light gray) 
2(7 intervals. From left to right, top to bottom T = (0,1,5,10,50,100,500, 1000)(7 (t). 
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Figure 9. Coverage obtained using (circles) x = 1,0), = 0, (squares) x = l,Oy = 0.1, (up triangles) x = 
10, Oy = 0, (down triangles) x = 10, Oy = 0.1 for (top) = 10,= 0.1 and (bottom) = 

10 ,^dispi _ jQ (left) T = 0, (middle) T = o(t) and (right) T = 10o(t). The dashed line shows the 

desired 95% coverage. This method typically over covers by a few percent for the low-statistics cases shown 
here. 



Figure 10. LHCb B —> Kfifi data observed at low recoil (image taken from Ref. []I3]). The peaking 
structure is due to the V^(4160). 
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A. Profile Likelihood 


The Poisson likelihood for observing events in the signal region and njj events in the background 
region is 

L{ns,nb\s,b) = ^{ns,s + b) ■ ^{nb,xb), (A.l) 

where the Poisson PDF is defined as 

(A.2) 

n\ 


and 5 is the signal rate, b is the background rate and x is the ratio of the size of the signal and 
background regions. The profile likelihood is fhen defined as 


K{s\ns,nb) 


L{s,b{s)\ns,nb) 

L{s,b\ns,nb) 


(A.3) 


where b[s) denofes fhe value of b fhaf maximizes fhe likelihood for fixed s, and s,b maximize L 
in general. These parameters can be obfained analytically* and are {s,b) = — nblx^Ublx) and 

b(0) = (ns + nb )/(I +x). In general, fhe estimator for b for any value of s is 


b{s) = ^s + nb - (1 +x)s + ^J{ris + Ub - (1 +x)sY + A{\ +x)snb^ /2(1 +x). (A.4) 


Asympfofically, —2 log A behaves as a wifh 1 DOF and so an approximafe p-value can be 
obfained from A. N.b., one may worry abouf fhe possibilify fhaf when 5 < 0 ^( 5 ) could become 
imaginary. See discussion af fhe end of fhis appendix on how fhe 5 < 0 case is deal! wifh. 

If is sfraighfforward fo accounf for uncertainfy in fhe relationship befween fhe sideband and 
search window yields by augmenting fhe likelihood as follows: 


L{ns,nb,x\s,b,y) = ^{ns,s + b) • ^{nb,yb)-^{y^x^Oy), 


(A.5) 


where fhe Gaussian PDF is defined as 

^(z,At,a) =(A.6) 

Now y is fhe scale factor thaf relafes fhe yields in fhe sideband and search regions whose PDF is 
faken fo be a Gaussian wifh mean x and uncerfainly of Oy. Following fhe same approach fo max¬ 
imizing L produces an algebraically infracfable sef of fhree equafions. Making fhe approximafion 
fhaf Oy/y <1/then to leading order in Oy/y 


s K, ns — nbjx 

y « x^ol{nblx-b(s,Oy = 0)) 
b « b{s,Oy = 0,x—^y), 


(A.7) 

(A.8) 

(A.9) 


where b{s^ Oy = 0) is the result given in Eq. A.4 and b{s^ Gy = 0,x —)■ y) uses the same equation but 
replaces x with y everywhere. Figure shows that this approximation is accurate out to p-values 
of about 1 ^( 10 ^'^) when the relative uncertainty on the scale factor is smaller than the relative 
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Figure 11. p-values obtained from the asymptotic formula for the profile likelihood as maximized (black 
points) numerically using MINUIT and (red line) using the analytic approximation. This example used 
rib = 50, X = 5 and oy/y = 0.1. The (blue squares) show the p-values for O) = 0 (no uncertainty in the 
scaling between regions). 

statistical uncertainty in the background rate. In my tests I find that for Oy/y = 0.1 this holds for rib 
up to about 200. 

For the case where Oy/y > the analytic approximation given is not valid so the values 

s,y,b must be found numerically, e.g., using MlNUlT. This is a valid approach but increases the 
CPU time required by a factor (^(100). In such cases, however, rib is large enough that the Poisson 
term for rib can be replaced by a Gaussian term. The likelihood is then given by 

L{ns,nb,x\s,b) = ^{ns,s + b) ■'^{b,nb/x,ab), (A.IO) 

where the statistical and scale factor uncertainties on the background rate are now included in a sin¬ 
gle term = {nb/x^){\+nbOy /x^). For large rib this reduces to Ob = {nblx^)Oy. The likelihood in 
this case can again be maximized analytically giving the following results: (f, b) = — riblx, nb/x) 

and for 5 = 0 

^(0) = ^ (^b/x-ol + ^J{ol-rib/xY+Aoln^ . (A. 11) 

Thus, it is possible to provide analytical solutions for all cases. 

When searching for a new particle the physical region is 5 > 0 and the test statistic used for 
discovery is K{s = 0) if 5 > 0. If 5 < 0, I take 5 = 0 which gives A = I. One would expect 
this to happen at half of all m(test) values considered which is handled naturally by the pseudo¬ 
experiment method when obtaining global p-values. When setting limits, I use the bounded method 
from Ref. [I7|] where the increase in the likelihood is taken from 5 = 0 instead of 5 for the case where 
5 < 0. This produces limits that are slightly more conservative but also never produces unphysical 
limits. 


^This requires differentiating logL with respect to j and b, setting these to zero, then solving the system of equations. 
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B. Resonances 


This appendix considers the relationship between deviations from the local-linear approximation 
due to resonance contributions. Figure |T^ shows the expected deviations from local-linear for a 
resonance with r/a(m) = 20. For this case, even if the resonance makes up close to 100% of the 
total PDF the choice x = 1 is still local-linear to about 10%. For smaller resonance contributions 
larger values of X are local-linear at this level. Figure |T^ and show similar plots for r/a(ni) = 10 
and r/a(/M) = 5. For the case T/o{m) = 10, the local-linear approximation is valid at the 10% 
level up to resonance contributions of about 50% of the total PDF, while for F /o{m) = 5 it is only 
valid at this level for small resonance contributions. 

These results are not surprising. For the case r/a(ni) = 5, the signal region (which is ±2a(ni)) 
contains almost half of the resonance probability. Any large contribution from such a resonance 
will need to be vetoed. Contributions from wide resonances, however, are safe even if they make up 
the entire PDF. For resonances with widths in the range 20 < T/o{m) < 5, applying a veto of the 
region \m — ni(resonance) | < F will typically be safe for any size resonance contribution. However, 
if it is known (or can be shown) that the resonance is not dominant, then such a veto may not be 
required. 
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Figure 12. Deviations from local-linear for a resonance with m = 1000 MeV, Y = 100 Mev, where 
(7(m) = 5 MeV. The left plots show the yield distribution (in arbitrary units) for various choices of res¬ 
onance contribution fraction. The right plots show the ratio of events expected in a signal region to the 
prediction from the sidebands for various test masses (x-axis) and for various sideband to signal region size 
ratios x. 
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Figure 13. Deviations from local-linear for a resonance with m — 1000 MeV, F = 50 Mev, where 
G{m) — 5 MeV. The left plots show the yield distribution (in arbitrary units) for various choices of res¬ 
onance contribution fraction. The right plots show the ratio of events expected in a signal region to the 
prediction from the sidebands for various test masses (x-axis) and for various sideband to signal region size 
ratios x. 
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Figure 14. Deviations from local-linear for a resonance with m — 1000 MeV, F = 25 Mev, where 
G{m) — 5 MeV. The left plots show the yield distribution (in arbitrary units) for various choices of res¬ 
onance contribution fraction. The right plots show the ratio of events expected in a signal region to the 
prediction from the sidebands for various test masses (x-axis) and for various sideband to signal region size 
ratios x. 
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C. Two-Sample Non-Parametric Goodness-of-Fit Tests 


Assuming that the lifetime distributions of all non-signal PDFs are the same in the signal and 
background regions, a two-sample non-parametric goodness-of-fit test to the hypothesis that the T 
PDF for the signal and background data is the same can be used to test whether a new particle with 
a resolvable lifetime is contributing to the data. A well-known test is the Kolmogorov-Smirnov 
(KSl firnil test which uses as test statistic 


r = MAx|F;(T)-F^(T)| 


(C.l) 


where F denotes the cumulative distribution. The asymptotic formula is used in this study to obtain 
an approximate p-value. 

Despite the KS test’s popularity in particle physics, it is known to not be as powerful as the 
Cramer-von-Mises tCvMl Iinil and Anderson-Darling (AD)[]T3] tests under many conditions. The 
CvM and AD tests build statistics 



(C.2) 


and 



(C.3) 


respectively. Both are based on the cumulative distributions like the KS test but instead of using 
only the maximum discrepancy, they use an integrated discrepancy. The AD test provides more 
weight to the “tails”. The approximate p-values for these tests are obtained using toy simulated 
data sets in this study. 

These two-sample tests are for shape only (in how they are used in this study). To test both 
size and shape two tests are run: (1) either the KS, CvM or AD tests for a shape comparison 
of sidebands to search window and (2) Poisson profile likelihood for yield comparisons. As the 
size and shape tests are uncorrelated (to very good approximation), Fisher’s method flTHl is used to 
combine the two p-values to get a single p-value. In this way both shape and size anomalies are 
tested and a single p-value is obtained. 
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D. Lifetime Test Results 


Figure [T^ shows the results of a study of lifetime-only based testing for a given test mass. Pseudo¬ 
datasets are generated using a Gaussian T PDF for prompt backgrounds, and an exponential with 
effective lifetime of lOa(T) for displaced backgrounds. Signal events are also generated using an 
exponential PDF (of varying choices of t) convolved with a Gaussian to mimic the resolution. The 
parameters for the background are taken to be < >= 10, < nf>= 0.1 and x = 5. In each 

case 10 signal events are added. The following tests are studied: 

• For comparison an “optimal” test is run where the true background PDF (including for dis¬ 
placed backgrounds) is used. This is a cheat since I assume that the displaced background 
PDF is unknown. The signal PDF is used but with T as a free parameter. 

• Pure shape-based two-sample tests (see Appendix C) are shown in the top left panel. There 
is not much difference between the KS, CvM and AD tests for this particular data set. As 
expected these tests provide no power for T <C (J(t) and increase in power with increasing T. 

• Profile likelihood tests are shown in the top right panel. The A test ignores lifetime infor¬ 

mation (hence its performance is independent of T). As expected this test is optimal for 
T ^ The A2 test is a two-region (prompt and displaced) counting experiment, where 

the likelihoods from each region are combined (via a product as usual). This simple test is 
nearly optimal except when T ~ a(T). Adding a third region does not improve the perfor¬ 
mance for this particular displaced background (it may if the displaced background had some 
additional structure). 

• The bottom left panel shows the results of performing the profile likelihood tests along with 
the shape-based KS test. This greatly enhances the performance of the KS test. 

• The bottom right panel, however, shows that performing the KS test with the two-region 
profile likelihood fesf only provides a small gain in fhe region T ~ a(T). 

While fhe combinafion of KS and A2 in fheory improves fhe performance for T ~ ct(t), if adds 
some complexify fo fhe analysis. For example, an addifional assumpfion has now been employed: 
fhaf all lifetime PDFs are fhe same in fhe signal and sideband regions. This is likely fo be frue for fhe 
prompf SM background; however, if may nof be frue for ofher fypes of background. Furfhermore, 
fhe gain in significance obfained by increasing fhe number of generated signal events from 10 to 11 
for T ~ a(T) is much greater than that obtained by performing the KS test along with the profile 
likelihood; i.e., fhe maximal gain in sensifivify in fhis example of using bofh fhe KS and A2 fesfs is 
< 10% in signal rale sensifivify. This gain requires adding an assumpfion which may nof be valid 
and is expecfed fo be difficull fo validale/study. Thus, my conclusion is fhaf fhe nominal fesf should 
simply be A2. 
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Figure 15. The (circles,squares) represent the fraction of data sets with p < (3,5)(7. The test labels are 
defined as follows: (opt) the cheat PDF-based likelihood fit; (KS,CvM,AD) the 2-sample tests (see text); 
(A(2,3)) the profile likelihood test, including the (2,3)-region versions; (/r5-|-A(2)) combination of both 
the KS and profile likelihood tests. N.b., the size of the markers has no meaning; this was done to aid in 
viewing markers for tests with similar power. 
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E. Fast Algorithm 


The two-region profile likelihood method ean be made extremely fast. In Appendix A I obtained 
analytic expressions that maximize the likelihood (including the case where a Gaussian uncertainty 
is included on the scaling factor that relates the sideband yields to those in the signal region); 
thus, no numerical minimization routine, e.g., MINUIT, needs to be run. Note that the data can be 
binned in mass (in both the prompt and displaced regions) to perform this test. This means that 
time consuming event loops are not required. Furthermore, as one scans the mass range, rather 
than summing up the yields in the signal and background regions, it is faster to simply subtract 
the bin(s) removed from each region and then add only the bin(s) added. Combining all of these 
optimizations results in a test that can be performed on lOM data sets with ^(1000) test masses 
in about 2 hours on a single CPU core. This process is trivially parallelized to run on multiple 
cores. Therefore, it is possible to confirm a significance of > 5a without the need to rely on an 
asymptotic p-value. This is a desirable feature since for rare decays the asymptotic formulae tend 
to underestimate the significance for very low p-values. 
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